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REGRESSION  WITH  GIVEN  MARGINALS 
Richard  A.  Vitale 

1 . Introduction 

Let  n(F,  G)  denote  the  class  of  random  vectors  (X,  Y)  with 

marginal  distributions  F and  G (X  ~ F,  Y~  G).  We  will  consider 

the  associated  class  of  regression  functions 

fl(F,G)  = {m(x)  = E[  Y|X  = x],  (X,  Y)  * n(F,G)}  . 

The  motivation  for  looking  at  this  class  is  similar  in  spirit  to  that  of 

isotonic  regression  (from  which  we  will  in  fact  borrow  a result):  the 

extent  to  which  auxiliary  information  be  incorporated  into  the  regression 

process.  Knowledge  of  marginal  distributions,  in  particular,  is  natural  in 

certain  types  of  problems.  We  may  consider  a census  in  which  bivariate 

observations  are  collected,  the  marginal  distributions  are  assumed  given 

(as  from  a previous  survey),  and  regression  is  desired.  Alternatively, 

there  is  the  problem  of  optimal,  non-linear  prediction  in  a time 

series  (X  }.  If  F is  the  equilibrium  distribution  of  the  X , then  the 
i 1 

optimal  one-step  predictor  (squared  error  loss)  is  E[X.+i  = x]  « 7ft (F,  F) 
(see  [3],  [5],  ( 6 J for  related  discussions  of  this  problem). 

In  section  2,  we  present  a characterization  of  7ft (F,  G)  for  a 
large  class  of  F and  G.  The  proof  follows  directly 
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from  methods  in  [lO] . Characterizations  of  the  type  indicated  have 
been  investigated  from  a variety  of  points  of  view  and  we  refer  the  reader 
to  [ 7 ] , [9]  for  other  discussions  and  references.  It  can  be  fairly  stated 
that  the  common  ancestor  of  all  such  approaches  is  the  fertile  theorem 
of  Hardy,  Littlewood  and  Polya  [4,  p.  49]  on  the  averaging  properties 
of  doubly  stochastic  matrices.  In  section  3,  we  investigate  further  the 
structure  of  7F(F,G)  by  considering  it  as  a convex  subset  of  an 
appropriate  Hilbert  space  and  examining  the  induced  projection  operator. 
The  discussion  is  motivated  by  a statistical  estimation  problem. 
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2.  Characterization  of  T(F.G) 


In  what  follows  we  shall  regard  F and  G as  fixed  and  satisfying 
(Al)  F and  G are  each  supported  on  all  of  R1  and  are  invertible. 
+ °c 

(A2)  EY2  = / y Gidy)  < . 

-00 

The  first  assumption  can  be  weakened  considerably,  but  we  present  it 
to  avoid  side-issues.  The  second  insures  that  W(F,G)  is  a subset 
of  L^[  (-00,  +oo ) ; F | , the  Hilbert  space  of  real-valued  functions  on  R 
square  integrable  with  respect  to  the  measure  determined  by  1 (this 
can  be  seen  directly  by  noting  EY  = E^E(  Y ! X ] > E^(  E[  Y I X] ) ). 

Turning  to  the  characterization  of  ^(F,  G),  we  note  that  if 
m(x)  = E(  YlX  = xj  t T(F,  G),  then  with  the  application  of  marginal 
probability  transformations  U = F(X),  V = G(Y),  we  have 
m(x)  = E[  G-1(V)  |U  = F(x)J,  where  U and  V are  each  uniformly 
distributed  on  [ 0,  l ] . This  is  essentially  the  object  of  study  of  l 10  ] 
and  with  only  minor  modifications,  the  methods  employed  there  yield 
the  following  result. 

Theorem  1 . The  following  statements  are  equivalent. 

(i)  m ( W(F,  G). 

(ii)  m lies  in  the  closed  convex  hull  (L^[  (-«,  +0C’);F  ] ) of 

functions  of  the  form  G * ° T ® F . 

x x 

(ill)  f m(F  (T(u)))du  > f G'^ujdu 
0 0 

for  all  x t [ 0,  1 | (with  equality  at  x = l)  ond  all  T c 3. 


-3- 


Here  J = {T  : [ 0,  l ] — [ 0,  l ] one-one,  Borel-measurable,  measure-preserving}. 

We  note  that  if  m ° F 1 is  non-decreasing,  then  the  strongest  inequality 

in  (iii)  occurs  upon  taking  T(u)  = u,  i.e., 

x x 
f m(F_I(u))du  > f G*1(u)du  . 

0 0 

+00  +00 

The  equality  condition  in  (iii)  amounts  to  f m(x)F(dx)  = f yG(dy) 

-00  -00 

or  Em(X)  = EY.  Finally,  for  the  projection  problem  it  will  be  useful  to  note 
that  the  mapping  h < h^[  (-  oo,  + oo);F]  — h » F 1 « L [ [0,  l ];  (i  = Lebesgue 
measure]  induces  an  isomorphism  between  the  two  spaces.  The  image 
of  !ft(F,  G)  under  the  mapping  can  be  described  as  follows. 


Corollary.  The  following  are  equivalent. 


(i)  mQ  < »f0. 


(ii)  mQ  lies  in  the  closed  convex  hull  (L  [[0,  l];^])  of 


functions  of  the  form  G • T. 


(iii) 


f m (T(u))du  > f G *(u)du 
0 0 


for  all  x « [ 0,  l ] (with  equality  at  x = l)  and  all  T « 3. 

Proof.  Change  of  variables. 

Remark.  From  (ii),  it  is  evident  that  for  each  T ( J,  mQ  € <=> mo  ° T t ' 
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3.  Projection 


Under  the  assumption  (X,  Y)  c n(F,  G),  a natural  criterion  for 

a 

judging  an  estimate  m(x)  of  the  unknown  regression  function  m(x) 
is  the  squared  error  loss 


E[  m(x)  - m(x)]2  = f [ m(x)  - m(x)]^F(dx)  . 

- 00 

It  is  evident  that  this  loss  can  be  reduced  (or  at  least  made  no  larger) 

by  constructing  a new  estimate  m(x)  which  is  the  projection  of  m 

onto  the  convex  7fl(F,  G).  For  this  reason,  it  is  of  interest  to  investigate 

the  projection  operator  associated  with  W(F,G)  in  L [(-«,+ »);F  ] : 

that  is,  for  h i L^[  (- °o,  +<»);F  ] , we  seek  the  (unique)  element 

h « ^(F,  G)  which  yields 

+ 00  ^ +00 
f fh(x)  - h(x)]  F(dx)  = inf  f [h(x)  - m(x)]^F(dx) 

- 00  m « W(F,  G)  - oo 

(~  throughout  will  denote  projection  in  the  appropriate  space).  A 

feature  of  this  projection  is  that  if  a constant  is  added  to  h,  then 

h remains  the  same:  this  can  be  seen  by  expanding 

+ 0O  +00 

/ [h(x)  + c - m(x)J  F(dx)  = f [ h(x)  - m(x)j  F(dx) 


2 r 

+ c + 2c  / h(x)F(dx) 


+ 00 

- 2c  f m(x)F(dx) 
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and  noting  that  the  first  term  alone  depends  on  m since,  as  we  have 


+ 00  +00 

noted,  f m(x)F(dx)  = f yG(dy)  for  m t 7ft  ( F,  G).  This  being  the 
- 00  - 00 

case,  we  shall  have  occasion  to  invoke  the  normalization 

+ 00  +00 
(A3)  f h(x)F(dx)  = f yG(dy) 

- 00  - 00 

and,  equivalently,  for  / = h »F  * 


(A3)'  f i(u)du  = f G *(u)du  . 
0 0 


We  now  investigate  the  projection  operator,  isolating  the 

main  aspects  of  the  argument  in  two  lemmas.  Some  notation  will  prove  to  be 

,x  -1 

convenient:  let  I(x)  = | G (u)du  and  let  capitalization  generally 

0 x 

indicate  integration,  e.g.  L(x)  = f !(u)du.  If  A(x)  eC[0,l|,  then 
* 0 

denote  by  A (x)  the  convex  minorant  of  A (i.e.  the  greatest  convex 
function  less  than  or  equal  to  A). 

Lemma.  Let  t t L^[  [ 0, 1 ] ; p. ] be  non-decreasing  (a.  e. ) and  satisfy  (A3)' . 
The  projection  I of  l onto  7ft  Q satisfies 


L(x)  = J l(u)du  = L(x)  - (L  - I)  (x)  . 

0 

Proof.  The  proof  will  be  given  first  for  step  functions  and  then  extended. 
(I)  For  a fixed  integer  N > 1,  suppose  that  I is  of  the  form 

N-i  _x 


t{ u)  = V l ,Ir  j ( u ),  x.  = , l < I 

J = o 1 (xj'x)+il  i N ) - )♦> 
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We  argue  first  that  it  is  enough  to  restrict  attention  to  candidates  for 
projection  which  are  similarly  non-decreasing  step  functions:  given 
n t v , we  apply  the  Cauchy-Schwarz  inequality  to  get 


N-l  j + 1 


.V  i 


f | <(u)  - n(u)|adu  = £ / - n(u)|  du  > 2,  1 V V 


j = 0 x. 


j = 0 


A , 

3+1 

where  n.  = N f n(u)du.  The  lower  bound  is  attained  for  n(u) 


identically  constant  on  sub-intervals.  Moreover,  it  can  further  be 

reduced  by  rearranging  the  n^  to  be  non-decreasing  (I  4,  theorem  US 

If  n^  are  the  rearranged  values,  then  we  have 
) 


f [ i(u)  - n(u)  ] ^du  > f [ f(u)  - n ^(u)]  du 
0 0 


where  n(T)(u)  = \ n(T)I,  ,(u).  We  now  show  that  n(T)(u)t^Q. 

j = 0 3 lXj’Xj+lJ 

Since  n T^(u)  is  non-decreasing  (a.e.),  by  the  remark  after  theorem  1, 

it  is  enough  to  show  that  ^(x)  = f n^  (u)du  > I(x)  with  equality 

0 

at  x = 1.  The  latter  condition  follows  from  the  normalization  (A3)1. 

Since  I(x)  is  convex  and  N(1)(x)  is  piece-wise  linear,  it  is  enough  to 
verify  the  inequality  constraints  at  the  nodes  {x,}.  We  have 
Xk  k-1 

N^T^(x  ) = f n T\u)du  = jj  ^ n.T\  which  is  the  integral  of  n(u) 
k 0 j = 0 J 

k 

over  k of  the  sub-intervals.  Equivalently,  it  is  equal  to  ( nl,T(u))du 

0 
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for  some  T which  appropriately  permutes  the  sub-intervals.  By  (ii) 
of  the  corollary,  this  is  bounded  from  below  by  Ijx^). 

We  now  have  a discrete  problem  to  solve: 


N-l 

minimize  ).  (/  , - n.) 

] - 0 3 


subject  to  (a)  the  n.  are  non-decreasing, 
k-1 

and  (b)  ^ n.  > I(x  ),  k = 1,  . . . , N - 1 with  equality  at  k = N . 

j = 0 3 k_1 

Imposing  only  constraint  (b),  the  problem  is  treated  in  [ 1,  pp.  46-51] 
as  a generalized  isotonic  regression.  Letting  L and  L denote  the 
partial  sum  vectors  of  f and  the  solution  vector  t respectively  and 
setting  I = (I(x  ),  I(x2),  . . . , I(x^)),  we  have 

L = L - (L  - I) 

where  * here  denotes  the  convex  minorant  of  a vector.  A straightforward 

2*2  2 

argument  shows  that  A^(L  - I)  < A^(L  - I)  (^k  denoting  a second 
difference).  Hence 

A^L  = A*[L-  (L  - I)*]  = A^L  - A*(L  - I)*  > A^I  > 0 . 

It  follows  that  L is  convex  and  that  t is  non-decreasing.  Thus  (a) 
is  satisfied  automatically.  ^ 

Translating  the  solution  of  the  discrete  problem  into  step  function 

~ * 
terms,  we  get  L(x)  = L(x)  - (L  - I)  (x). 
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(II)  If  /(u)  is  not  a step  function,  then  for  each  N > 1, 
approximate  f(u)  with 

x 

N-l  j+1 

*N( u)  = V [N  / f(u)du]  I[x  x ,(u)  . 

j = 0 x )'  j + J 

) 

By  (I),  we  have 

(1)  Ln(x)  = Ln(x)  - (Ln  - lUx)  . 

Now  as  N -►  «,  # / and  / -*  / in  L.^[[  0,  1 | ; . Since 

x L x x 

[ I I T(u)du]  < ( f.,(u)du-~  | t (u)du,  the  dominated  convergence 

o o o 

theorem  yields  L^(x)  "*  L(x)-  Similarly,  L-j^( x)  — L(x).  Further,  since 

L^  -*  L uni formly  and  * operates  continuously  in  the  uniform  norm, 

# * 

( L^  - I)  -»  (L  - I)  . Taking  limits  (N  — oo)  in  (l)  yields  the  lemma. 

If  t is  not  monotone,  then  some  additional  preparation  is  required 
to  obtain  its  projection  on  ir,^.  For  l t L [ [ 0,  1 ] ;p.] , define 
tf  i L^[  l 0,  l];p]  as  the  increasing  rearrangement  of  t.  There  exists 
a measure-preserving  transformation  : [0,1]  — [0,1],  not  necessarily 
one -one,  such  that  / = t ^ » S(  ([S  ]). 

Lemma.  Let  t t L l[0,  l];p.]  and  satisfy  (A3) 1 . Then  if  t and  are 
the  projections  of  t and  t ^ respectively  onto  ?rQ , 

t = i o s.  . 

f l 

Remark.  The  construction  for  1 1 has  been  given  in  the  previous  lemma. 
Proof.  If  / « L ([0,  1 ],>],  then  / f c ( 0,  1 ] ,>] . Using  a change  of 
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variables,  we  have 


/ [ t .(u)  - g(u)  ]2du  = / [ /(u)  - (g  « S/)(u)  ]2du 

0 6 


and  taking  infima  over  g 


TT 

o 


f l*(u)  - t (u)]2du  = inf  f [ l(u)  - (g 
0 g^0  0 


S/)(u)]Zdu 


= f [ !(u)-(r  o S/)(u))2du  . 


The  lemma  will  follow  if  we  can  show 


' and 


(i)  inf  f [i  (u)  - (g  o S/)(u)]2du  = inf  f [I  (u)  - g(u) ] 2du 
g c^o  0 g < 77?0  0 

(n)  r o S|  . 


Each  is  a consequence  of  the  identity  TT  ° Sf  = 7 ft  that  is, 

5 ° S/  * <===;>  g e V PQint  interest  is  that  S(  may  not  be  one 

However,  Brown  [2,  theorem  3]  has  shown  that  there  exists  a sequence 

(T  } CJ  such  that  g <>  T -►  g » S..  Accordingly,  if  g t 7ft  _,  then 
n n / 

g ° T t ft  (see  the  remark  after  the  corollary  of  section  l)  and  since 
n 0 

TTg  is  closed  lim  g . = g . t 7ftQ.  Conversely,  if  g » t 7ftQ, 


n — oo 

then  using  an  approximating  sequence  -f Tn } 


fig  o S/  - g . TnllL  [[0(ll;ilj 


- || g o S o T -g 
i n 


b2l(0,  il;ji] 


0 , 
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one. 


for  each  n and  ^ is  closed,  we  have  g ‘ WQ- 


Since  g ° ° T 


-1 


We  can  now  state  our  main  result. 


n-1* 


Theorem  2.  Let  h t L^[  (-<*>,+  °°);F]  and  satisfy  (A3).  Let  (h  » F ) ^ be  the 
increasing  rearrangement  of  h ° F with  h ® F = (h  o F ) ° S. 

Then  the  projection  h of  h onto  Wi(F,G)  is  given  by 


h=(h.F_l)foSoF 


where  (h  ® F 1)?  satisfies 


x 


/ (h  o F *)  (u)du  = J (x)  - J2(x) 
0 


x , x , 

and  J (x)  = f (h  ° F ) (u)du,  J (x)  = J (x)  - j G (u)du. 

1 0 * 2 1 0 

/ 

proo{.  Together  with  the  indicated  isomorphism  between  L^l  [0,  1 |;p] 
and  L^[  (-  00  , + 00  ) ; F J , the  statement  combines  the  two  lemmas. 


4 . Concluding  Remarks 

We  have  investigated  the  structure  of  17!  (F,  G)  through  a characteriza- 
tion result  and  an  examination  of  the  induced  projection  operator.  Despite 
the  rather  formidable  description  of  the  latter,  computational  versions 
have  proved  to  be  accessible.  In  particular,  the  operations  * and  t 
together  with  the  extraction  of  the  measure-preserving  transformation  S 
are  reasonably  straightforward  (a  discussion  of  some  relevant  algorithms 
can  be  found  in  [ 1 ]). 

As  in  isotonic  regression,  the  fact  that  analytical  resources  are 
available  to  attack  the  problem  investigated  here  suggests  that  other 
nonlinear  regression  problems  may  be  amenable  to  similar  treatment. 


-12- 


REFERENCES 


» 


[ l]  Barlow,  R.  E.  , Bartholomew,  D.  J.  , Bremner,  J.  M.,  and  Brunk,  H.  D. 
(1972).  Statistical  Inference  Under  Order  Restrictions.  Wiley, 

New  York . 

[2]  Brown,  J.  R.  (1966).  Approximation  theorems  for  Markov  operators. 
Pacific  J.  Math.  J_6,  13-23. 

[ 3 J Grenander,  U.  , McClure,  D.  E. , and  Vitale,  R.  A.  (1969). 

Prediction.  DAM  and  CCIS  Report,  Brown  University. 

✓ 

[4]  Hardy,  G.  H.,  Littlewood,  J.  E. , and  Polya,  G.  (1967).  Inequalities 
(second  edition).  Cambridge  University  Press. 

[5]  Jaglom,  A.  M.  (1971).  Examples  of  optimal  nonlinear  extrapolation 
of  stationary  random  processes.  Selected  Transl.  Math.  Statist, 
and  Prob.  % 27  3-298. 

[6]  Masani,  P.,  and  Wiener,  N.  (1959).  Non-linear  prediction.  In 
Probability  and  Statistics.  U.  Grenander  ed . , 190-212. 

[7]  Ryff,  J-  V.  ( 1965).  Orbits  of  L* -functions  under  doubly  stochastic 
transformations.  Trans.  Am.  Math.  Soc.  1 17.  92-100. 

[8]  Ryff,  J.  V.  (1970).  Measure  preserving  transformations  and  rearrange- 
ments. J.  Math.  Anal.  Appl.  3 1 . 449-458. 

[9]  Strassen,  V.  (1964).  The  existence  of  probability  measures  with 
given  marginals . Ann.  Math.  Stat.  36.  423-439. 

[ 10 1 Vitale,  R.  A.,  and  Pipkin,  A.  C.  (1976).  Conditions  on  the  regression 
function  when  both  variables  are  uniformly  distributed.  Ann.  Prob.  4, 
869-873. 


-13- 


ICCU HI T v '»•  this  '»*<•' 


ft. »»»•./) 


'6. 


r 


1 R*f*uHl  NUMUfW 


REPORT  POCUWENTATION  PAGE 

l GOVT  ACCf)»IO 


RKAU  INSTWUC'TIO.VJ 

nf.powr.  c:c  itT.r.  i iN(.  i ohm 


10 


4.  TITLE  (and  iublllla) 


REGRESSION  WITH  GIVEN  MARGI 


5INALS  # l 


7.  AUTHORf#) 


Richard  A.  7 Vitale 


.^Vitale  I 


I Hf  CU'ltH  ' S A1  AlGG  NUMtt 


ZLJWc^.i 


*TS 


Nummary  specific 


reporting  period 


6 PERFORMING  (JIIO.  Rf  PORT  HU»II!  »< 


P*~ [»  contract  or  grant  humhlh;.; 

(/Jpt  DAAGZ9 -7 J 


9 PERFORMING  ORGANIZATION  NAME  AND  AODHESS 

Mathematics  Research  Center,  University  of 
610  Walnut  Street  Wisconsin  J 

Madison^  Wisconsin  S3706 


• I!  CONTROLLING  Office  NAME  and  ADDRESS 

*U.  S.  Army  Research  Office 
»P.O.  Box  12211 

J Research  Triangle  Park.,  North  Carolina  277  09 

rtr  MONITORING  \GENCY  name  » ADDRESSf/f  iSItloiont  Ir mi  Controlling  Olllco) 


/y)  i£p*. 


— 7i  r 


10.  PROGRAM  FL  EMT.NT.  PRO.II  l T TASK 
AREA  A WORK  UNIT  HUMUERS 


^-'"T  12.  REPORT  OATS ■ 

/7/jf  Jananey  h>  7 7 / 

' — -"H  II  NUMBER  OF  P AGES 


1 3 


IS.  SECURITY  CLASS,  (ol  thl • ropott) 

UNCLASSIFIED 


Is*  DLCl  ASSI  PIC  ATION  'downgrading 
SCHEDULE 


It.  DISTRIBUTION  S 


TATEMENT  fof  thlo  Roporl)  ^ . / G- 


4 Approved  for  public  release;  distribution  unlimited. 


17  DISTRIBUTION  STATEMENT  (ol  tha  abstract  antarad  In  Block  20,  ll  dlllarant  Irom  Raporl) 


If  KEV  WOROS  (Contlnua  on  ravataa  alda  II  naeaaaary  and  Identity  by  block  ntmtbar) 

Regression  nonlinear  prediction 

isotonic  regression 
convex  minorant 
rearrangement  of  a function' 


20.  ABSTRACT  ( Conllnu • on  rororoo  oldo  H nocoooor,  mtg  Idonltl , k,  Hock  nomkot) 

We  consider  the  class  of  regression  functions  ’ME,  G)  = fm(x) 

E(  Y | X = x ],.(X,  y)  t IT(  F,  G) } where  II(F,G)  denotes  the  set  of  random  vectors 
with  marginal  distributiorfs  F and  G.  A characterization  ol  ^(l.G)  is  given 
together  with  a representation  for  the  projection  operator  ivfhduces 
in  an  appropriate  Hilbert  space.  Applications  are  ir 


DP  1473  eoiTiowof  1 NOvasnoBSQtcTt^/^  UNCLASSIFIED  ^ tP  ' P) 


SECURITY  CL  ASSIf  1C  ATION  Of  THIS  PAGf  (Mini  nolo  I me, 0.1) 


//■ 


